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I. Appendix: proofs 
A. Proof of Theorem 1 

Proof: For a rule set A, we will show that if any rule z 
has support smaller than some constant C, then removing it 
yields a better objective, i.e., A i arg max.v F(A'). Assume 
rule z e A is such a rule and A\ z has rule z removed from A: 

A\ z — [a\a e A, a + z}. 

Our goal is to find conditions on C such that 


F(A) < F(A Xz ). 


( 1 ) 


P(S\A V ) > 


5 (TP - C + a+, FP + /?+) B(TN + a_, FN + C + yS_) 


where 


83(C) 


B(a+, p + ) 
=P(S|A) ■ g 3 (C), 


B(a-, p.) 


( 2 ) 


r(TP + a + -C) r(TP + FP + a + + p + ) 
F(TP + q' + ) F(TP + FP + a + + /3 + — C) 
r(FN + /?_ + C) T(TN + FN + a_ + /?_) 


. (3) 

r(FN + /?_) r(TN + FN + a- + /?_ + C) 

Now we break down gi(C) to find a lower bound for it. 
The first two terms in (3) become 

(TP + FP + a+ + p + - C) . . . (TP + FP + a+ + /?+ - 1) 
(TP + a+ — C) . . . (TP + a+ - 1) 

TP + FP + a+ + p+ - C C 


> 


TP + a + — 1 
N+ + ar+ + P+ - 1 


(4) 


last two terms in (3) become 

(FN + /?_)... (FN + P - + C - 1) 

(TN + FN + a -+ P -)... (TN + FN + + /?_ + C - 1) 

FN + f>- xC 

FN + TN + a- + P- 
c 


P- 


(5) 


k N- + a- + P- 

Equality in (5) holds when TN = AC, FN = 0. Combining (2), 
(3), (4) and (5), we obtain 

P(S\A\ Z ) > ( N+ t a+ + P+ ;\ T , p ~ . n \ C ■ P(S\A ). 


Assume in rule set A, M; comes from pool J\.\ of rules with 
length Z, / e {1,...L}, and rule z has length V so it is drawn 
from cC I. Let TP, FP, TN and FN be the number of true 
positives, false positives, true negatives and false negatives in 
S classified by A. We now compute the likelihood for model 
A\ z . The most extreme case is when rule z is an accurate 
rule that applies only to real positive data points and those 
data points satisfy only z. Therefore once removing it, the 
number of true positives decreases by C and the number of 
false negatives increases by C. 

Step 1: Relate P(S\A\ Z ) to P(S|A). 


A+ + a+ - 1 N- + a- + fi- 


(6) 


Step 2: Relate p(A\ z ) to p(A). 

A\ z consists of the same rules as A except missing one rule 
from jf 1 f We must have: 


P(A\ Z ) = 


B{M V + a' v \tf. v \ - M V + p\) 


n 

w 

=P(A) 

=P(A ) 


BWpPfl 

B(Mi + or/, jJA/l - Mi + Pi) 

B(auPi) 

B{M t > + a' p |JA/'| - M t > + p'f) 
B(M V + 1 + a', | JAH -Mr- 1 + p[) 
+ p v 

Mr — 1 + ar 


decreases monotonically as Mr increases, so it 
is lower bounded at the upper bound on Mr, mp. Therefore 

+ + ft- (or,- e 

Mr - \ + ap mp - 1 + ap 


Thus 


, „ w |^(/ 1 -mi + p,\ 

p(A\ z ) > max I — r — — | p(A). 


1 \ mi - 1 + a/ 


(7) 


Step 3: Combine Step 1 and Step 2. Combining (6) 
and (7), the joint probability of S and A\ z is bounded by 

P(S,A V ) = p(A Xz )P(S\A v ) > 


(\ 

nax - 
' V 


|JA;| - mi + Pi\ (N+ + a+ + p+ - 1 /?_ 


’-)[ 


N + + a+ - 1 

Equality holds in (4) when TP = AC, FP = 0. Similarly, the 


,, , ■ P(S,A ). 

mi — 1+07 / \ N+ + a+ - 1 N~ + a- + P- ) 

In order to get P(S,A\ Z ) > P(S,A), and with 


N++a++f3 +- 1 fi- 
N++a+-\ N-+a-+f3- 


< 1 from the assumption in the theo- 


rem’s statement, we get 

logmax( l ^ /H , n ' +ft ) 

° I V mi-l+ai ) 


c < 


N++a+-l N-+a-+f3- 


lo S N + +a++f3+-l p- 

This means if 3z g A, supp(z) < C, then A <t arg max 4 / F(A'), 
which is equivalent to saying, for any z e A*, 


supp(z) > 


log max ( 


\&l\-mi+Pi ' 

m/— 1+07 , 


log 


N + +a+-\ N-+a-+P- 

N++a++/3+-l P- 


B. Proof of Theorem 2 

Proof: Since A* G argmax A L(A), F(A*) > v*(f), i.e., 

log p(S\ A*) + log p(A*)>v*(t). (8) 

We then upper bound the two terms on the left-hand-side. 

Let M* denote the number of rules of length / in A*. The 
prior probability of selecting A* from JA is 

,^B(M l +a l ,\a l \-M l +p l ) 

p(A ) = p(<b) — — —j. • 

\ = \ B{a,, |JA/| + /?/) 

We observe that when 0 < M ; * < | JA/|, 

B(M\ + au |JA/| - M/ + Pi) < B(aj, |JA/| + Pi), 


giving 


p(A*) < p(d» 


= pm 


<pm 


B(M v +a' v \n v \-M v + p'f) 

B(4|JAH+y8 ? ') 

ai> ■ (ap + 1 ) • • • (My + av - 1 ) 

mr\-M;+p,,)---(\Jl,'\+P,'-l) 

( M* +a v - 1 \ M i' 

\U‘A/-| + p v -\) 


(9) 


for all /' G { 1, ..., L). The likelihood is upper bounded by 


p(S\A*) < £ t . 


( 10 ) 


Plug (9) and (10) in (8) we get 

/ Ml +a v - 1 \ 

log X* + log/7 (0) + M; log > v*(t), (11) 

+ Pi' - 1 / 

which gives 


= mj! 1 for all /' G {1,...,L). 


(12) follows because M ; * < mi>([t - 1]. Thus 


\A*\ = Y J M i<y J m \ 


[0 


/=! 


1=1 


(12) 


